Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance
نویسندگان
چکیده
Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance Daniel Zinn Bertram Ludäscher {dzinn,ludaesch}@ucdavis.edu Abstract. Provenance graphs capture flow and dependency information recorded during scientific workflow runs, which can be used subsequently to interpret, validate, and debug workflow results. In this paper, we propose a new concept, called abstract provenance graph (APG). APGs are created via static analysis of a configured workflow W and input data schema, i.e., before the workflow is actually executed. They summarize all possible provenance graphs the workflow W can create with input data of type τ, that is, for each input v ∈ τ there exists a graph homomorphism Hv between the concrete and abstract provenance graph. APGs are helpful during workflow construction since (1) they make certain workflow design-bugs (e.g., selecting none or wrong input data for the actors) easy to spot; and (2) show the evolution of the overall data organization of a workflow. Moreover, after workflows have been run, APGs can be used to validate concrete provenance graphs. A more detailed version of this work is available as [12]. 1
منابع مشابه
On Explicit Provenance Management in RDF/S Graphs
The notion of RDF Named Graphs has been proposed in order to assign provenance information to data described using RDF triples. In this paper, we argue that named graphs alone cannot capture provenance information in the presence of RDFS reasoning and updates. In order to address this problem, we introduce the notion of RDF/S Graphsets: a graphset is associated with a set of RDF named graphs an...
متن کاملDatabase Support for Exploring Scientific Workflow Provenance Graphs
Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and an...
متن کاملOPQL: Querying scientific workflow provenance at the graph level
Article history: Received 21 December 2011 Received in revised form 30 August 2013 Accepted 31 August 2013 Available online xxxx Provenance has become increasingly important in scientific workflows to understand, verify, and reproduce the result of scientific data analysis. Most existing systems store provenance data in provenance stores with proprietary provenance data models and conduct query...
متن کاملA Comprehensive Model for Provenance
In this paper, we propose a provenance model able to represent the provenance of any data object captured at any abstraction layer (workflow/process/OS) and present an abstract schema of the model. The expressive nature of the model makes it potential to be utilized in real world data processing systems.
متن کاملWhat’s in a name? Exploiting URIs to enrich provenance explanations in plain English
Provenance allows decision-makers to evaluate the importance of pieces of data. PROV is the standardised model of provenance for use on the web, particularly suited for situations where data is generated by systems under distributed control, such as in coalition operations. If human decision-makers are to make effective use of provenance data, they need to understand it, and this work establish...
متن کامل